Edit or run this notebook

The 10-armed testbed: learn by implementing!

Author: Rodrigo Chang

I was reading Sutton and Barto's book (2ed) on Reinforcement Learning, section 2.3, where they explain about the k-armed bandit problem and I realized that maybe it would not be so hard to implement an interactive version of figure 2.2 of the book. I share my work here to show that sometimes you can implement something as you read and that is a great way to learn and practice the concepts you are studying.

My workflow:

  • I created the module Bandits and started writing functions and trying them in next cells.

  • When I was happy with the result, I added the interactivity and plots.

Have fun!

19.8 Î¼s
7.2 s
Main.workspace24.Bandits
11.0 ms

ϵ= 0.1

154 ms

You like to explore a little!

6.7 ms

Number of armed bandits k= 10

13.7 ms

Steps = 1000

8.2 ms

Episodes = 2000

46.0 Î¼s
4.3 s
Loading...i